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The sensory-triggered activity of a neuron is typically characterized in terms of a tuning curve, 
which describes the neuron's average response as a function of a parameter that characterizes a 
physical stimulus. What determines the shapes of tuning curves in a neuronal population? Pre- 
vious theoretical studies and related experiments suggest that many response characteristics of 
sensory neurons are optimal for encoding stimulus-related information. This notion, however, does 
not explain the two general types of tuning profiles that are commonly observed: unimodal and 
monotonic. Here, I quantify the efficacy of a set of tuning curves according to the possible down- 
stream motor responses that can be constructed from them. Curves that are optimal in this sense 
may have monotonic or non-monotonic profiles, where the proportion of monotonic curves and the 
optimal tuning curve width depend on the general properties of the target downstream functions. 
This dependence explains intriguing features of visual cells that are sensitive to binocular disparity 
and of neurons tuned to echo delay in bats. The numerical results suggest that optimal sensory 
tuning curves are shaped not only by stimulus statistics and signal-to-noise properties, but also 
according to their impact on downstream neural circuits and, ultimately, on behavior. 
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Introduction 

Sensory neurons respond to physical stimuli, and this relationship is often quantified by plot- 
ting their evoked activity — for instance, the mean firing rate — as a function of a relevant stimulus 
parameter. The resulting response functions or tuning curves have been the subject of much the- 
oretical work, particularly in vision. In trying to understand such tuning curves, the emphasis 
has been on information maximization, the main idea being that sensory neurons should represent 
the sensory world as accurately and efficiently as possible |fl}[3]]. This principled approach, known 
as the efficient coding hypothesis, has been extremely successful at predicting the receptive field 
properties of neurons in early visual HHZl and auditory II8I9I areas, and is consistent with numerous 
experimental observations lfT0Hl3l . 

However, information maximization is not enough. Such a principle cannot completely account 
for the response characteristics of cortical neurons, particularly beyond early sensory areas, because 
it does not consider how the encoded information will be used, if at all — it would not make sense 
for sensory neurons to pack lots of information into parts of feature space that are of little relevance 
to the animal. A recent study | fl4) investigating auditory responses in grasshoppers illustrates 
this. Primary auditory receptors in grasshoppers do not respond equally well to different kinds 
of environmental sounds. Instead, the stimulus ensemble that maximizes their information rate 
consists of short segments of grasshopper songs that mark the transitions between song syllables 
ITSl . Thus, such early receptor neurons seem to be highly specialized for describing a rather small 
set of sounds that are relevant for a specific behavior, namely, discriminating grasshopper songs 
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This raises an interesting question, does an animal's behavior influence the shapes of its sen- 
sory tuning curves? and if so, what features would be most sensitive to behavioral constraints? 
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Figure 1. Schematic of the model. There are 
n sensory or basis neurons that respond to M 
stimuli and drive TV motor neurons downstream. 
The firing rate of motor neuron a (shown filled) 
when stimulus k is presented is equal to R a k = 
J27=i Wai r ' iki wnere r ik is the firing rate of sensory 
neuron i and w a i is the connection (shown in red) 
from sensory neuron i to downstream neuron a. 
For each motor neuron a, the driven response R a t 
should approximate as closely as possible a desired 
response F ak . 



There are, in fact, two motivations for addressing this problem. On one hand, the limitations just 
discussed of the efficient coding principle. On the other, what I see as a theoretical mystery: the 
ubiquity of monotonic tuning curves. Tuning curves come in two main flavors, single-peaked and 
monotonic (increasing or decreasing). Bell-shaped curves with a single peak are the textbook ex- 
ample of tuning functions. They are indeed quite common Ifl6l - l2ul , and many modeling studies 
have investigated the coding properties of arrays of such unimodal curves subject to some form of 
noise |2T1 - I25ll . Monotonic dependencies on stimulus parameters, however, have also been amply 
documented, not only in the somatosensory system I26H2811 but also in other modalities l29ti3ll . 
Monotonic tuning curves have received little attention from theorists. No analysis has been re- 
ported from the standpoint of efficient coding, and it is not clear whether they present any advan- 
tage regarding other criteria, such as learning 1321 . To complicate matters further, some neuronal 
populations show mixtures of monotonic and peaked curves l33rl35l . 

Why is there such a range of tuning curve shapes? And, in particular, what promotes the devel- 
opment of monotonic profiles? To investigate more closely whether behavioral factors play a role 
in this problem, here I evaluate the responses of a neuronal population not only in relation to their 
sensory inputs but also in terms of the range of outputs that they are capable of generating. The 
sensory tuning curves are seen as a set of basis functions from which other functions of the stimu- 
lus parameters can be easily constructed I36I37II ; these other functions represent motor activity or 
actions that are generated in response to a stimulus. The idea is that, if something can be said about 
the statistics of the downstream motor activity, then we should be able to say something about the 
sensory tuning curves that are optimal for driving such activity. 

Results 

Tuning Curves as Basis Functions 

To begin, the problem needs to be defined mathematically. The situation can be described using 
some of the tools of classic function approximation I38I39L and is schematized in Figure 1: n basis 
neurons respond to M stimuli or conditions and drive N additional downstream neurons whose 
output should approximate a set of desired functions F. The basis neurons represent sensory neu- 
rons whose tuning curves we are interested in, and the downstream units represent motor neurons 
that contribute to generating actions. The key quantity to study is the matrix r, where is the 
firing rate of basis neuron i evoked by stimulus k. These basis responses may have intrinsic vari- 
ability (noise), so their mean values are denoted as (ry.), where the brackets indicate an average 
over multiple presentations of the same stimulus. Since the second index parameterizes stimulus 
values, the tuning curve of cell i is simply (r^.) plotted as a function of k. As mentioned above, the 
rationale of this approach is that, although the motor responses F may be largely unknown in real- 
ity, if they have some regularity or statistical structure, this should partially determine the optimal 
shapes of the sensory tuning curves (r) . For the moment, however, pretend that the repertoire of 
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motor responses F that should be elicited by the stimuli is fully known. 

To proceed, a mechanism is needed for the sensory neurons to communicate with the motor 
neurons. The simplest assumption is that the downstream motor units are driven through weighted 
sums. Thus, the response of downstream unit a to stimulus k is R a k = Yj7=i Wai rik/ wnere w ai 
represents the synaptic connection from sensory neuron i to downstream neuron a (Figure 1). In 
matrix notation, this is R = w r. In this simple model, the shapes of the tuning curves become 
important when there are more downstream neurons than basis neurons (n < N) and when there 
is noise, so both conditions are assumed to be true. 

Next, recall that the job of downstream unit a is to produce the target motor response F a (where 
F a is row a of F). Therefore, what is needed is for the driven responses, R = w r, to approximate 
as closely as possible the desired ones, F. Crucially, however, different sets of tuning curves (r) 
will vary in their capacity to generate the target downstream responses. This capacity is quantified 
using an error measure denoted as Eb- When Eb is zero, the sensory (basis) neurons are most 
accurate and the driven responses are equal to the desired ones; when E B is 1, the driven activity 
has little or no resemblance to the desired activity and the error is maximal. The derivation of Eb is 
presented in the Methods section. What is important, however, is to understand its dependencies, 
which are as follows: Eb = Eb{(y) , er, {s^}. <&). First, the error depends on the sensory tuning 
curves (r) and on their noise, er. Second, note that there is no dependence on the synaptic weights. 
This is because Eb is constructed assuming that, for each (r), the best possible synaptic weights 
are always used. Third, Eb depends on how often each stimulus is shown; that is, on the set of 
coefficients {sk}, where Sk is the probability that stimulus k is presented. Finally, Eb does not 
depend directly on the actual motor responses F. Instead, the key independent quantity is their 
correlation matrix €>, which captures their overall statistical structure. Its components are 
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In essence, <I> represents an average over all the downstream motor responses that the basis neurons 
have to approximate. This average corresponds to drawing the F a k values from given distributions, 
or equivalently, to choosing multiple functions F a from a given class (see below). 

In summary, given the noise of the neurons (er), the statistics of the stimuli ({s/c}), and the 
statistics of the downstream responses (<fr), the error Eb can be calculated for any set of sensory 
tuning curves (r) . 

What Determines the Optimal Tuning Curves? 

So far, what I have done is set up the problem and develop a quantity that measures the ef- 
fectiveness of the sensory tuning curves as building blocks for constructing the desired motor re- 
sponses. Recall, however, that the goal is to find the best tuning curves. In the present formalism, 
this is the same as asking what tuning curves (r) minimize Eb- 

It should be noted, however, that Eb cannot completely determine the optimal tuning curves. 
This is because the problem is fundamentally under-constrained: since the network model is linear 
(R = wr), any transformation by an invertible matrix A such that w — * wA andr — > A _1 rproduces 
the same approximation and thus leaves the error unchanged. Therefore, additional conditions on 
w or r are required to make the solution unique. These conditions are crucial, in that they can lead 
to quite different results II6I40L but it is instructive to ignore them momentarily; this provides some 
intuition into the problem, as well as a lower bound on Eb ■ 

Before considering specific examples, it is important to discuss the key factors that will deter- 
mine the solution. Intuitively, the tuning curves should match, as much as possible, the TV target 
functions F a . If all the functions are different, then clearly a lot of tuning curves will be needed for 
accurate approximation. In this case, 'different' means 'not highly correlated', which in turn means 
that 3? will have large values along its diagonal (see below). On the other hand, if the functions 
F a are similar to each other, then very few tuning curves should suffice. Or, if more tuning curves 
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are available, many of them can be used to cover specific regions where <& varies more abruptly. 
Therefore, what matters when designing tuning curves is really the number of distinct functions 
that need to be approximated, as measured both by how big N is and how correlated the functions 
F a are. 

The rest of this section formalizes this intuition and describes more precisely the dependence 
of the optimal tuning curves on <I>. The reader who wishes to skip the mathematical details may 
safely move on to the next section. 

To better understand the effect of <&, it is useful to decompose it using a special set of vectors 
(eigenvectors) and their corresponding coefficients (eigenvalues). The idea is to use the eigenvec- 
tors of to construct the optimal tuning curves. Assuming that all stimuli are equally probable, the 
key property of 3? is that its M eigenvalues are all non-negative and add up to M (see Methods). 
When 4> results from averaging either just a few functions (<C M) or many functions with similar 
shapes, only a few eigenvalues are significantly larger than zero. Conversely, when the average 
involves many different functions, most eigenvalues are close to 1 and f> is strongly diagonal. 

Keeping these properties in mind, as well as the fact that Eb varies between and 1, now 
consider a single basis neuron. Assume that its tuning curve is proportional to an eigenvector of $ 
with eigenvalue A. In that case, Eb depends on only two numbers, A and a signal-to-noise ratio p 
that is equal to the mean response squared divided by the mean variance of the neuron. That is, 

£B( „ =1)= (i_ik_i (2) 

p+ 1 

(see Methods). This expression leads to three important observations. (1) When the neuron's vari- 
ance increases, p tends to zero and the error tends to 1. Thus, as expected, higher noise always 
pushes the error toward its maximum. (2) The worst-case scenario is A = 0. This produces the 
maximum error, regardless of the noise, and occurs when the tuning curve is completely different 
from (orthogonal to) all the target functions used to compute (3) For any signal-to-noise ratio, 
the lowest error occurs when A is the largest eigenvalue of <&, in which case the single tuning curve 
is equal to the so-called first principal component l4ll of 3?. This one tuning curve may suffice to 
generate a very small error, if the noise is low and A = A max « M . But, on the other hand, if A max is 
small, the error will be large even if the tuning curve has the optimal shape and zero noise. 

The efficacy of the single basis neuron thus depends on its variability, on the largest eigenvalue 
of and on the similarity between the tuning curve and the eigenvectors. An analogous result is 
obtained with more neurons, except that additional eigenvalues and eigenvectors become involved 
(see Methods). Specifically, with n basis neurons and no noise, the minimum error that can be 
achieved is 

n 

min(E B ) = l-—J2 X > ( 3 ) 

»=i 

where Ai, . . . A n are the n largest eigenvalues of <fr. The key in this expression is that the sum 
involves n terms only. This is significant because, if <J? has just a few large eigenvalues, the sum of 
the n largest ones may approach M even if n <c M, so few noiseless tuning curves with the right 
shapes will suffice for representing accurately all the desired motor responses. This happens, for 
instance, when the motor responses are similar to each other, i.e., are highly correlated. Conversely, 
if many eigenvalues are close to 1, then 3? is strongly diagonal and it is certain that a much larger 
number of sensory neurons will be needed, even if noise is not a factor. Numerical results support 
these theoretical conclusions (see Supporting Information). 

Monotonic Versus Non-Monotonic Tuning Curves 

Armed with a criterion that quantifies the accuracy of the sensory tuning curves and takes 
into account the statistics of the motor outputs, now we can ask, what sets of tuning curves are 
optimal when there is variability and specific families of downstream functions are considered? To 



4 




50 
Stimulus number 



Figure 2. Optimal tuning curves for four classes of downstream functions. (A) High-frequency oscillating 
functions. Each function F was composed of 8 sinusoids of random phase and amplitude. Four examples are 
shown. Inset depicts correlation matrix obtained from 5000 functions. (B) Low-frequency oscillating functions. 
(C) Saturating monotonic functions. Each F was an increasing or decreasing sigmoidal curve of random 
steepness and center point. (D) Non-saturating monotonic functions. Each F was an increasing or decreasing 
exponential curve with random steepness. (E-H) Optimal sets of 2, 4 and 8 tuning curves for the classes in the 
corresponding columns. Shown responses minimized Eb and were constrained to remain between and 40 
spikes/s. 



investigate this, each tuning curve was parameterized by four numbers, such that either monotonic 
or unimodal profiles with a large variety of shapes could be produced, and a numerical routine was 
used to find optimal parameter combinations that minimized Eb (see Methods). By limiting the 
possible tuning curve shapes, this procedure eliminated the ambiguity problem mentioned earlier. 

Figure 2 illustrates the results of computer experiments in which optimal tuning curves were 
obtained numerically for four classes of downstream responses. Examples of functions within each 
class are shown on the top row, next to the corresponding <& matrices (Figure 2A-D). The graphs 
below show the sets of 2, 4 and 8 tuning curves that minimized Eb in each case. When the target 
functions are non-monotonic (Figure 2A and 2B), the optimal tuning curves are themselves non- 
monotonic (Figure 2E and 2F). Similarly, when the target functions are monotonic (Figure 2C and 
2D), the optimal tuning curves are also monotonic (Figure 2G and 2H), even though the target 
classes comprise both increasing and decreasing functions. The detailed features of the optimal 
tuning curves clearly depend on the specifics of the target class. For instance, the number of peaks 
and troughs of the oscillating target functions affects the optimal width of the unimodal curves 
(compare Figure 2E and Figure 2F). Most notably, however, because the noise properties and stim- 
ulus statistics remained constant, all the differences across columns are due to constraints that act 
downstream from the sensory neurons. 

These results were highly robust with respect to various manipulations (see Supporting Infor- 
mation). Increasing the noise, adding a power constraint, using non-uniform stimulus probabili- 
ties, or parameterizing the tuning curves differently did not alter the main finding: optimal tuning 
curves are predominantly monotonic or non-monotonic depending on the type of downstream ac- 
tivity they are meant to evoke. Furthermore, manipulating the stimulus probabilities alone never 
gave rise to monotonic curves; for this, a monotonic trend in the downstream responses was nec- 
essary. 

As mentioned in the Introduction, both unimodal and monotonic tuning curves are found in 
various parts of the brain, and this diversity has remained unexplained (see also the Discussion). 
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Figure 3. Optimal tuning curves for downstream func- 
tions that have both peaked and monotonic compo- 
nents. (A) Four examples of functions F obtained by 
combining a localized oscillatory function (with a Gaus- 
sian envelope) and a saturating monotonic function. 
Such functions represent hypothetical motor responses 
to stimuli at various binocular disparities. Inset depicts 
correlation matrix. (B) Optimal sets of 2, 4 and 8 sensory 
tuning curves obtained with low noise. (C) As in (B), 
but with high noise and high power cost. In all plots, 
the x-axis represents binocular disparity. 



The above results suggest that the two types of responses may arise not because of information- 
coding considerations but because of differences in the actions that various types of stimuli ulti- 
mately trigger. For instance, some stimulus parameters, such as the orientation of a bar, should 
lead to approximately the same sorts of movements regardless of the parameter's value. But other 
parameters or features, such as image contrast or sound intensity, have an obvious directionality, 
in that salient stimuli of high contrast or high intensity are more likely to lead to action. Thus, 
sensory neurons might respond in a qualitatively different way to features with and without such 
a behavioral bias because that is the most effective way to generate the appropriate actions. The 
next two sections present two realistic situations where such motor asymmetries may arise. 

Mixed Tuning Curves for Binocular Disparity 

Binocular disparity provides an interesting example of a signal that is likely associated with an 
intrinsic bias in behavior. To see the source of the asymmetry, consider what possible movements 
may be triggered by a visual stimulus at a given disparity. If a stimulus is seen near zero disparity 
(i.e., at the plane of fixation), many subsequent actions are possible, such as reaching, biting, fix- 
ating, etc. In contrast, if a relevant stimulus appears at a positive disparity (i.e., behind the plane 
of fixation), a diverging eye movement should typically follow, because that will bring the object 
onto the plane of fixation for more detailed examination. Conversely, converging eye movements 
should be seen more often following stimuli of negative disparity (i.e., in front of the plane of fixa- 
tion). As a consequence, an oculomotor unit that is strongly activated at positive disparities should 
have a tendency to fire weakly at negative disparities, and viceversa. This rationale implies that, for 
any relevant oculomotor cell, the responses at opposite ends of the disparity range should tend to 
be anti-correlated, and these should be approximately independent of the responses triggered near 
zero disparity. The downstream functions in Figure 3A are meant to capture this statistical regular- 
ity. They vary strongly in the middle of the stimulus range but have much more stereotyped values 
at the extremes. 

Optimal tuning curves for this class of downstream functions are shown in Figure 3B and 3C. 
These curves have two novel features: they mix unimodal and monotonic profiles, and include 
intermediate curves with a peak superimposed on a monotonic component. Recently, it has been 
shown that disparity tuning curves in area V4 have precisely these characteristics. The V4 popula- 
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Figure 4. Optimal tuning curves for downstream func- 
tions that vary more rapidly near one end of the stim- 
ulus range. (A) Three examples of continuous, oscilla- 
tory functions (with Gaussian envelopes) that oscillate 
at high frequency near stimulus 1 and at progressively 
lower frequency near stimulus 50. They represent hy- 
pothetical motor responses of bats as functions of echo 
delay or target distance. (B) Set of 8 tuning curves that 
minimized Eb given the correlation matrix in (A) and 
low noise. (C) As in (B), but with high noise and high 
power cost. (D) Three examples of discontinuous F 
functions. Each one is a collection of constant segments 
placed randomly. Segment width increased linearly as 
a function of segment location on the x-axis. (E) Set of 
8 tuning curves that minimized Eb given the correla- 
tion matrix in (D) and low noise. (F) As in (E), but with 
high noise and high power cost. In all plots, the x-axis 
represents echo delay. 

tion comprises a continuum of disparity tuning patterns that includes monotonic (the classic near 
and far cells), unimodal (the classic tuned cells) and intermediate cells 1331 . 

Widening Tuning Curves for Echo Delay 

The final example addresses the issue of tuning curve width. The downstream functions illus- 
trated in Figure 4A are meant to capture a distinctive aspect of the behavior of bats, which locate 
prey by means of echolocation. In this case, consider a bat pursuing a moth. From far away, the 
bat can approach the moth by following its average path, smoothing out the moth's high-frequency 
maneuvers. At a close distance, however, the bat must turn at least as sharply as the moth itself 
in order to catch it, particularly in a cluttered environment I42I43I . Thus — this is the crucial as- 
sumption — when a bat flies toward a small target, its maneuvers must be faster as the target is 
approached. This postulate is translated into a statement about motor responses by generating 
functions that vary rapidly near stimulus 1 (corresponding to near targets, or short echo delays) 
and vary progressively more slowly at higher stimuli (corresponding to far targets, or long echo 
delays). Examples of such hypothetical motor responses are shown in Figure 4A. The optimal tun- 
ing curves for this case are non-monotonic, as might have been expected, but most notably, their 
widths increase as functions of the preferred stimuli (Figure 4B). This effect is extremely robust. It 
was also observed when the tuning curves were parameterized differently, and when high noise 
and high power cost were used (Figure 4C). Many auditory neurons of the bat's sonar system have 
this particular property. They are tuned to echo delay, and their tuning-curve widths vary linearly 
with the so-called 'best delay' I44I45L which is the echo delay at which the peak response is elicited. 

Again, note that the model generates this result based on a single statistical assumption about 
the motor responses, which is a progressive change in their absolute rate of variation along the 
echo-delay range. This is confirmed by Figure 4D, for which radically different downstream func- 
tions were generated. Here, piecewise-constant functions were used, each composed of a variable 
number of segments that had random amplitudes and locations. The only structure was a correla- 
tion between segment length and segment location along the x-axis. It is this correlation that gives 
rise to the systematic change in tuning curve width (Figure 4E and 4F). 

A key question here, however, is whether curves of increasing width could also result from an 
uneven distribution of stimulus probabilities Sk, without assuming an asymmetry in the down- 
stream functions. Technically the answer is yes — a progressive widening was obtained by using 
monotonically increasing stimulus probabilities together with the downstream functions in Fig- 
ure 2A and 2B. But there were three severe problems with this purely sensory mechanism: (1) the 
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effect required high noise, (2) it was much weaker, meaning that variations in width were small, 
and (3) most importantly, it placed the narrow tuning curves in the region of highly probable stim- 
uli, which for the bat means that nearby targets must be encountered much more often than far 
away ones. Therefore, the puzzling widening of sensory tuning curves documented in the bat may 
be explained more parsimoniously by assuming that flight control needs to be faster as the target 
gets closer. 

Other Tuning Curve Shapes 

The parametric approach presented here allows a direct comparison between monotonic and 
peaked tuning curves. Would the results hold, however, if other shapes were allowed? To address 
this question, optimal tuning curves were recalculated using drastically different constraints. The 
basis responses were simply required to be positive and bounded, whereas the synaptic weights 
were constrained to be sparse. With sparse connectivity, each downstream function is approxi- 
mated using only a subset of all the available basis neurons. The optimal tuning curves obtained 
with this method were much more variable, as expected given the absence of restrictions on their 
shapes, and often had multiple peaks. However, an index measuring the monotonicity of the curves 
in each population was computed, and in terms of this index the results were very similar to those 
obtained with parameterized curves: the monotonicity of the basis responses was determined by 
the monotonicity of the downstream functions, and conversely, strongly monotonic tuning curves 
could not be produced by manipulating the stimulus statistics alone. Details of these numerical 
experiments are discussed in the Supporting Information. 

Discussion 

Both unimodal- and monotonic-encoding populations of neurons are common and are main- 
tained by different brain regions fl6l42QI26l43~Tl , including areas beyond the periphery where tun- 
ing curves seem to be actively synthesized I28I46I . Yet the factors that determine whether a specific 
neuronal population develops monotonic, unimodal, or mixed responses have remained a mystery. 
Computationally, unimodal curves are different from monotonic ones in two ways. First, they al- 
low learning to be local, in the sense that changing the weight of a peaked curve affects the output 
function only over the range of the curve, not over the entire input space II36I38L Second, it seems 
that representing multiple values simultaneously would be much easier with peaked curves, es- 
pecially when the difference between coded values is large relative to the curve width. Although 
the importance of these differences remains unclear, they further illustrate the lack of theoretical 
justification for monotonic sensory responses. A possible solution to this enigma, however, is to 
consider the types of actions that various stimuli ultimately trigger. 

Maximizing Fisher Information Is Not Enough 

The classical approach to sensory coding involves information maximization 11251 . Thus, it 
would seem that some of the examples discussed above could be formulated in more familiar terms 
by requiring that more Fisher information 1231241 , or equivalently higher accuracy, be found in cer- 
tain parts of the sensory space. For instance, in analogy with Figure 3A, what happens if much 
higher accuracy is needed in the middle of the stimulus range than at the edges? Could such con- 
ditions lead to monotonic tuning curves? 

The answer is no. This is because a function specifying a desired relative accuracy at each point 
in the stimulus range is exactly equivalent to the set of coefficients that were used to represent 
stimulus probabilities. That is, Sk can also be interpreted as the weight or importance of the error 
between driven and desired motor responses when stimulus k is presented (see Equation |6|. For 
instance, when these coefficients had a Gaussian instead of a uniform profile, the results were 
entirely consistent with an increase in Fisher information at the middle of the range; more tuning 
curves were located near the middle, and those were narrower than the ones at the edges. These 
effects depended on the level of noise, as expected, and were rather subtle, but the key point is that 
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such manipulations had no bearing on whether the optimal tuning curves were monotonic or not 
(see the Supporting Information for further results). Therefore, while information maximization is 
clearly important, the downstream functions in this model have a much stronger influence on the 
optimal tuning curve shapes. 

Inputs, Outputs and Optimality 

Previous theoretical studies have attempted to explain the properties of sensory neurons based 
on two elements, an optimality assumption (efficient coding) and the statistics of their inputs, i.e., 
the statistics of natural images or natural sounds I3I5H9I13I14I . Conceptually, the approach here 
was not dissimilar. An optimality assumption, accurate function approximation, together with the 
statistics of motor responses were used to infer the shapes of sensory tuning curves. However, 
the present model works backwards, in that it requires knowledge about downstream rather than 
upstream events (note, however, that stimulus statistics are still taken into account through the co- 
efficients Sk and through correlations with the downstream functions they evoke). Clearly, whereas 
measuring the statistics of natural images or sounds is straightforward, determining the statistics 
of motor activity associated with specific stimuli poses a challenge. However, assuming that such 
motor statistics have some structure, because of the animal's behavior, the results of the present 
model are straightforward: the shapes of the optimal sensory tuning curves should be adapted to 
that structure. 

Two main conclusions follow from these results, a general one and a specific one. The gen- 
eral observation is that, contrary to what is implicitly assumed in most studies, the optimality of 
sensory-triggered responses depends not only on their variability and on the statistics of stimuli, 
but also on the downstream events driven by those responses fl4l . If the downstream demands 
change, the responses considered optimal will change as well, at least as required by minimization 
of the performance measure used here, Eb- One particular consequence of this is that the optimal 
width of peaked tuning curves is not uniquely determined by signal-to-noise considerations II23I25B 
(more on this below). This suggests that a comprehensive understanding of the firing properties of 
sensory neurons requires knowledge of the downstream impact of their responses. 

In retrospect, this point may seem obvious. If the motor functions to be approximated are 
monotonic, so should be the tuning curves of upstream neurons that drive them. However, this 
idea has not been formally articulated before. Furthermore, previous explanations of key features 
of tuning curves — tuning curve width, degree of overlap between curves, number of peaks, etc. 
— have always been based on arguments about coding efficiency. The simple model presented 
here indicates that such features may also generally depend on the motor actions performed by the 
animal. This, I believe, is a new insight, because it applies to neurons that are firmly considered as 
sensory. 

The specific point is that monotonic and non-monotonic curves are optimal under subtly dif- 
ferent circumstances, which may depend on what can be termed a 'behavioral bias'. This simply 
refers to an asymmetry in the relevant sensory stimulus. A bias exists when different parts of the 
stimulus range lead to different sets of possible actions, so that not all stimulus values are equal. 
The classes of downstream functions used here were meant to abstract this distinction in a simple 
way, and the results suggest that monotonic curves are efficient when there is such an asymmetry. 
Image contrast 113011 and pressure on the skin 11271 are good examples because, just on the basis of 
detection probability, high values are much more likely to lead to behavioral responses than low 
ones. But in general, weaker or more restricted biases may lead to populations of neurons with 
both monotonic and peaked tuning curves, as seen experimentally lT3"3Tl3"5l . 

Model Predictions 

If the model is correct, some variations in tuning properties across sensory populations should 
correspond to adaptations that enhance motor activity. Specifically in the case of arrays of Gaussian 
tuning curves ET1 - I251 , the model predicts that downstream motor responses should vary more 
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rapidly in the stimulus range where the Gaussian curves are narrower. For instance, according to 
Figure 4, echolocating bats must compute motor functions that vary a lot around zero echo delay. 
Elegant experimental studies by Moss and collaborators are consistent with this interpretation. 
They show not only that the rate of turning of bats indeed increases as a target is approached 1421431 , 
as was argued earlier, but also that their vocalizations speed up in several ways: (1) the rate at 
which sonar calls are emitted increases as the target gets near, (2) the duration of each call decreases, 
and (3) each frequency-modulated call consists of a sweep from a high to a low frequency, and the 
speed with which the frequencies are swept also increases. These three quantities vary by a factor 
of about three from the beginning to the end of a capture |43l . Furthermore, in the brown bat, 
microstimulation of the superior colliculus produces not only movements of the head and pinna 
but also sonar calls, where the number of evoked sonar pulses increases as a function of both the 
duration and the amplitude of the injected current |47|. These data strongly suggest that relevant 
motor neuron activity is indeed generally faster in the region where narrow tuning curves are 
found. 

The model may also be useful for understanding sensory responses associated with escape or 
evasive behaviors in which, as a potential threat approaches, the motor reaction should be faster. 
This is a behavioral-bias scenario: if the likelihood or the speed of an evasive movement increases 
monotonically as a function of the proximity and speed of an incoming object, then one should 
expect the driving sensory neurons to have monotonic profiles. This, indeed, is reported to happen 
in several systems. For example, flying locusts make a characteristic dive when predator-sized 
stimuli are looming on one side. The key for triggering the glide is thought to be a single movement 
detector unit, the so-called DCMD neuron, and this neuron fires with increasing frequency as the 
looming stimulus gets nearer |48] . Similar monotonic responses as functions of distance II49I50I and 
speed ||T9] have also been documented in other neurophysiological preparations where escape or 
collision avoidance is important. Even in monkeys, neurons that are sensitive to the distance of an 
object approaching the face seem to have monotonic dependencies on object distance (see Figure 4 
in |5T1 ). Likewise, cortical neurons that respond to optic flow, which are particularly useful for 
avoiding obstacles during locomotion l52ll , encode heading speed (i.e., the speed of one's own 
motion) in a predominantly monotonic way |53] . 

Perhaps the most counter-intuitive consequence of the model is that, when behavior does not 
require high accuracy, the sensory representation should be correspondingly coarse, even if, in prin- 
ciple, it could be made more precise. As illustrated in Figure 2B and 2F, when the motor response 
functions are broad, so should be the sensory tuning curves. An impressive data set collected by 
Heffner and colleagues supports this notion II54I55I . They have shown that sound localization ca- 
pacity in mammals varies tremendously, with discrimination thresholds ranging from about 1° in 
humans and elephants to about 30° in mice and horses. These differences are not accounted for by 
variations in interaural distance, animal lifestyle, or environmental cues. Instead, "sound localiza- 
tion acuity in mammals appears to be a function of the precision required of the visual orienting 
response to sound" ||54| . The argument is this. A primary purpose of auditory localization is to 
generate an orienting response, i.e., to bring the sound source into the fovea for detailed visual 
analysis. Consequently, species with small areas of best vision (e.g., human, elephant) need to gen- 
erate highly precise movements, whereas species with large areas or streaks of best vision (e.g., 
mouse, horse) do not. Based on 24 mammalian species, the correlation between sound localiza- 
tion acuity and foveal width is 0.92. Crucially, however, the correlation with visual acuity itself 
is -0.31, so a purely sensory explanation again fails. According to classic sensory coding notions, 
species with high acuity must have tuning curves that are correspondingly narrower or less noisy. 
Therefore, in view of the behavioral data, large variations in the width of sound localization tuning 
curves are expected across species. These would be explained by motor constraints, as predicted 
by the theory. 

Similar interpretations should be possible in other systems, as long as sensory tuning curves 
can be directly related to clearly-defined behaviors. 
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Conclusion 

The model developed here is based on an optimality criterion for neuronal tuning curves that 
takes into account both sensory (upstream) and motor (downstream) processes. This simple model 
is useful when sensory responses can be functionally related to specific behaviors, in which case it 
may explain some features of sensory representations that appear intriguing from the traditional 
perspective of sensory coding based on information maximization. In particular, this approach 
provides a theoretical rationale for the existence of monotonic tuning curves, which so far have 
lacked a plausible explanation, and yields some insight into the apparently idiosyncratic varieties 
of sensory tuning curves observed across neurophysiological preparations. 

Methods 

Numerical Methods 

All calculations were performed using Matlab (The Mathworks, Natick, MA). Results are shown 
for n between 2 and 8 neurons and M = 50 stimuli. The mean response of neuron i as a function of 
stimulus k, where k= 1, . . . , 50, was parameterized as follows 

, > = a i (1 + exp (-hirm)) 2 

(1 + exp ((—k + Ci — hi) mi) ) (1 + exp ((fc — c, — hi) m,) ) 

where <Zj is the amplitude of the curve for neuron i, Ci is the center point, hi the half-width, and 
rrii a factor that determines the slope. This expression produces either unimodal curves, which 
may have positive or negative kurtosis, or monotonic curves, which may vary in steepness. The 
correlation matrix C was obtained by assuming that noise is independent across neurons, in which 
case Cij = (rik Tjk) = {rik) {rjk) + $ij &jk, where % = 1 if i=j and is zero otherwise. For each neuron 

i, the SD of the noise during stimulus k was aik = a (r max /2 + y/ (r^)^ , but other choices produced 
similar results. In the low- and high-noise conditions, a = 0.05 and 0.5, respectively. 

Matrices 3> were produced directly by generating 5000 functions F a within a class and av- 
eraging the pairwise products F a kF a i. The functions in each class were determined by small 
numbers of parameters. For example, for the saturating monotonic curves (Figure 2C), F a k — 
a + b (1 + exp ((c — k) /d))^ 1 , where, for each a, the center point c and slope factor d were chosen 
randomly within a range and a and b were set to satisfy two normalization conditions. The first one 
was Sfc=i s k Fak = for all a, so the mean of each downstream function was set to zero. This was 
simply to shift the baseline of each F a and make the resulting <I> matrix easier to visualize in the 
plots; it had little or no effect on the optimal tuning curves. The second normalization condition 
was SfeLi s fe ^kk = 1. It limited the amplitude of the downstream responses. Final values of <&ki 
varied depending on the chosen class of functions, but never exceeded the range [—2.4, 4]. 

Given the terms Sk and a routine searched for the combinations of parameters a, h, c and 
m in Equation|4]that minimized Eb (Equation|8]l. The minimization routine used the Nelder-Mead 
downhill simplex method f56ll . A set of tuning curves was deemed optimal only after extensive 
testing and refining to ensure that the solution was near the global minimum. Additional con- 
straints were included by adding suitable penalty terms to Eb ■ For instance, to constrain the total 
power, a term proportional to Yl7=i Sfc=i s & r \k was added. 

Tuning curves were also generated using a second parameterization (Figure 3 and Supporting 
Information). In this case, Equation H] was substituted with a combination of two half-Gaussians 
with different widths and baselines but a common peak l33l , 

f br + (a i -br)exp(-(k-c i ) 2 /(hr) 2 ) if k < Cl 
ink) = { (5) 

{ b+ + (a l -b+)exp(~(k-c i ) 2 /(h+) 2 ) ifk>a 

Here there are six free parameters per basis neuron: the center point Cj, amplitude ai, left and right 
baseline levels 6~ and bf, and left and right widths hj and hf. 
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Derivation of E B 

The objective here is to derive an expression that quantifies how well the sensory tuning curves 
(r) approximate a desired set of downstream responses F. The standard procedure is to consider 
the average squared difference between driven and desired responses, i.e., the norm |R — F|. Be- 
cause R=wr, this produces 



^ N M 

~/v EI EI ' 




^ w ai r ik - F ak J ) (6) 

Ki=l 



where the coefficient Sf- represents the probability of stimulus k, such that Ylk=i Sk = an< ^ * ne 
average indicated by angle brackets is over repeated presentations of a given stimulus. El is the 
linear approximation error. This number quantifies how accurately the sensory (basis) neurons and 
associated synaptic weights are able to generate the desired motor activity downstream. 

The next step is to obtain an expression for the error that no longer depends on the synaptic 
connections. To do this, the idea is to find the set of synaptic weights w opt that minimize El 
assuming that the sensory tuning curves are known. These optimal weights are then substituted 
back into Equation [6j and the result is an expression for the mean square error that assumes that 
the synaptic connections are always the best possible ones. This is as follows. 

First, find the optimal connections by calculating the partial derivatives of El in Equation [6] 
with respect to w a i and equating the result to zero. This gives the optimal weights 

M n 
k=l j=l 

where C _1 is the inverse of C, and CV, =5Zfe=i s k ( r ik fjk) is the correlation between sensory neu- 
rons i and j. The weights w opt generate linearly-driven responses R that, on average, approximate 
the target motor responses as accurately as possible given the mean firing rates of the sensory neu- 
rons and the statistics of the stimuli. 

Having minimized El with respect to the connections, next, find out how big it is by substi- 
tuting the optimal weights of Equation [7]back into Equation [6] and rearranging terms. Calling the 
result Eb, this gives 

n n 

^^-EE*^ 1 ^ 

i=l j=l 

where 

M M 

Qij = EI EI Sk si ( rife ) fa*) ( 9 ) 
fe=i i=i 

and $fc2 = jr Yla=i F°< k ^cdi as mentioned in the main text. Thus, Eb is a function of the first and 
second moments of the sensory responses, the stimulus probabilities, and the output correlations 

Importantly, note that in the expression above, the following normalization condition was im- 
posed: Ylk=\ s k ®kk = 1. This limits the amplitude of the downstream functions and bounds Eb 
between and 1. That the error cannot be negative follows directly from the definition of El above. 
That it is bounded by 1 is not immediately obvious, but is a consequence of the fact that the eigen- 
values of $ are non-negative, and the normalization restricts their total sum. See the next section 
for more details. 

Lower Bounds on the Approximation Error 

In this section, the two analytic results discussed in the main text, Equations|2]and|3l are devel- 
oped. The main expression derived below is, in fact, a slightly more general statement about the 
accuracy of the sensory tuning curves. 
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Here, a key simplifying assumption is that all stimulus probabilities are equal, so Sk = l/M for 
all k. As a consequence, the maximum eigenvalue of 3? satisfies 1 < A max < M. This important 
property is true for two reasons, first, because 3? results from the product of a matrix times its 
transpose, which guarantees that all its eigenvalues are non-negative, and second, because of the 
normalization condition on 4>, which in this case is 

M M 

£ a **** = mE a * = 1 - (10) 

fe=i fe=i 

This condition makes the sum over all eigenvalues equal to M. Hence, A max is bounded between 1 
and M. 

To see how Eb depends on the sensory responses, recall that their correlation matrix C is such 
that dj — X)fe=i s k ( r ik fjk)- Then, for a single basis neuron (n = 1) with mean response (r^), C 
becomes a scalar C = r 2 + a 2 , where 

k=l k=l 

Also, if the tuning curve is an eigenvector of <I> with eigenvalue A, then Yli=i ^ki (ri) = A (rk), by 
definition, and Equation [9] gives Q = Xr 2 /M. Substituting into Equation [8] and defining p~r 2 /a 2 
leads to Equation El which is the approximation error for a single neuron. 

With more neurons, it is possible to derive a lower bound on the error that is more general than 
Equation[3] First, assume that Sk = l/M and that the noise has equal magnitude and is uncorrelated 
across neurons, such that 

C = ^(rr T ) = ^<r)(r) T +^I„ 

Q = ^(r)*(r) T (12) 

where I„ is the nxn identity matrix and a 2 , the variance averaged over stimuli, is the same for 
all neurons. To proceed, consider the singular value decomposition (SVD) of the matrix of mean 
responses, (r) = uSV T , where u is an n x n unitary matrix, V is an M x M unitary matrix, and 
S is an n x M matrix with n singular values along the diagonal and zeros elsewhere l56l . This is 
assuming that the n tuning curves are independent; if not, then the number of non-zero elements 
of S will equal the number of independent curves (the rank of (r)). The SVD is a generalization to 
rectangular matrices of the classic eigenvalue decomposition. Substituting C and Q into Equation|8] 
and using the SVD representation of (r) leads to 

1 M M 
E B = 1-±Y, (V T *VS T (SS T + Ma 2 ^)- 1 S ) = 1 - ± £ (V^VD)^ . (13) 

The first equality results from the defining property of unitary matrices, such that uu T = I„ and 
VV T =Im- The second equality results from grouping into D all the terms involving S. The matrix 
D turns out to be M x M and diagonal, with entries = Sf /{Sf + Ma 2 ), where a single index 
is used to indicate relevant elements in diagonal matrices. Note, however, that only the first n 
diagonal elements are non-zero, because S itself only has at most n non-zero singular values along 
the diagonal (recall that S is diagonal but rectangular, n x M). The lower bound on the expression 
above thus involves a sum of only n terms; the bound is 
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where Ai , . . . , A„ are the n largest eigenvalues of 

To see this, first write * in terms of its eigenvalue decomposition, so that V T <I>V = V T EAE T V, 
where E is the matrix of (right) eigenvectors of Suppose that the eigenvalues A are sorted 
in decreasing order, so that Ai is the largest. Now note that the diagonal elements of the matrix 
V T <frV depend on the match between V and E. In particular, the best possible match occurs when 
V is identical to E; then V T E AE T V = A and the equality in Equation [14l follows directly from 
Equation [13] This means that equality is obtained when the basis tuning curves are constructed 
using the eigenvectors of <fr sorted in decreasing order (i.e., V is equal to E). In contrast, if, for 
example, V has the same columns as E but sorted in the reverse order, then the resulting sum is 
similar to that in Equation [14] except that it involves the n smallest eigenvalues. That Eb varies 
between and 1 follows from EquationllOl 

Equation [14] is the main analytic result and provides important intuitions about the mean basis 
responses, or sensory tuning curves, (r) . These are as follows. 

1 . With n = 1 the result is Equation|2]written as an inequality, with the signal-to-noise ratio equal 
to S 2 /{Mo 2 ). Also, EquationHis obtained when a 2 = 0. 

2. Noise always increases the error, because a 2 effectively decreases every eigenvalue in Equa- 
tion[H 

3. Noise partially determines the optimal shapes of the tuning curves. For example, if Ai > A2 
but S2 3> Si, then the second eigenvector should take precedence over the first, because its 
signal-to-noise ratio will be much higher. In other words, in this case the first column in V 
should contain the second eigenvector of <J?. Thus, noise also determines which eigenvectors 
should be chosen in what order, and therefore the optimal shapes of the basis responses. 

4. Because of the last point, noise helps solve the ambiguity discussed earlier — that the set 
of basis responses is determined up to an invertible transformation. However, it does not 
entirely solve the problem, and this is why. When a = 0, both matrices u and S are absent 
from Equation [14] Therefore, they are arbitrary; they do not affect the error (as long as they 
are unitary and diagonal, as required). In contrast, with noise there is a criterion for setting 
the Si values, so only u remains arbitrary. Thus, without noise (r) is ambiguous up to an 
invertible transformation, whereas with noise it is ambiguous up to a unitary transformation. 

5. In addition, it is important to mention that this ambiguity remains when the stimulus proba- 
bilities are not uniform. When arbitrary probability values s& are included in the calculation 
of Equation [13] the resulting expression for the error still does not depend on u. Therefore, 
manipulating the stimulus statistics does not solve this problem. 

6. Finally, to minimize Eb in the presence of noise it is best to increase S 2 as much as possible. 
However, the total power in the mean responses is J27=i £fc=i ( r ik) 2 = S 2 - Therefore, 
with noise, additional constraints that effectively limit the power are necessary to obtain op- 
timal responses of finite amplitude. 

Supporting Information 

Supporting information is appended to this preprint. It is also found online at DOI: 10.1371 / jour- 
nal.pbio.0040387.sd001 (154 KB PDF). 
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Supporting Information 



Main text: Salinas E (2006) How behavioral constraints may determine optimal sensory represen- 
tations. PLoS Biology 4(12): e387. DOI: 10.1371 /journal.pbio.0040387. 

Parameter Manipulations 

Figure SI shows additional results in which the same four classes of target functions shown in 
Figure 2 were used but other aspects of the computer experiments were varied. In Figure S1A, 
the noise of the basis responses was much higher. This caused all tuning curves to rise and fall 
more steeply than with low noise. This makes sense because, to increase the signal-to-noise ratio 
as much as possible, higher firing rates are necessary. In Figure SIB, a term penalizing the total 
power of the responses was added to the error (see Methods), which amounts to putting an energy 
cost on the neural activity. This tends to favor unimodal neurons, which typically have smaller 
mean firing rates across all stimuli than monotonic ones. In this case, unimodal curves reduced 
their amplitudes and monotonic curves shifted toward the edges. Also, a monotonic curve was 
exchanged for a unimodal one. Such exchanges occur only when adding a neuron results in a large 
power penalty, larger than the corresponding decrease in Eg. In the examples shown, Eg was 
already very low with 7 neurons, so the advantage in accuracy of an additional monotonic curve 
was too small relative to its high mean rate. In Figure SIC, the stimulus probabilities were not 
uniform. Instead, they had a Gaussian profile centered on the middle stimulus (see Figure S4F). 
This caused the center points of the tuning curves to shift very slightly toward the high-frequency 
stimuli. The effect, however, was much stronger in conjunction with the other manipulations, as 
shown in Figure SID. Here, all three changes — high noise, high power cost, and unequal stimulus 




50 50 50 50 

Stimulus number 



Figure S1 . Sets of eight optimal tuning curves obtained under various conditions. Columns correspond to the 
same classes of downstream functions as in Figure 2A-D. Numerical experiments were as in Figure 2, except 
for changes stated explicitly. (A) High noise. The SD of the firing rates was ten times as high as in Figure 2. 
(B) High power cost. A penalty term proportional to the sum of the squared responses was added to Eb- (C) 
Unequal stimulus probabilities. The probability profile (sk as a function of ft) was Gaussian with an SD of 
7. The stimulus in the middle of the range was about ten times more frequent than those at the edges. (D) 
Combined conditions. The three previous manipulations were applied simultaneously. (E) As in (D), but with 
the tuning curves parameterized in an alternative way (Equation|5j. 
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probabilities — were applied simultaneously. The tuning curve locations varied strongly but their 
shapes remained qualitatively the same. Finally, to investigate whether the results depended on the 
specific tuning curve parameterization that was chosen, all numerical experiments were repeated 
with a second parameterization that allowed tuning curve profiles intermediate between unimodal 
and monotonic; this was the same parameterization used by Hinkle and Connor If33l (see Methods 
and Figure 3B). In all cases, the results were similar to those obtained earlier. Figure S1E shows an 
example in which the same conditions of Figure SID were used. Although the individual shapes 
are slightly different, as expected, all curves are either unimodal or monotonic — no intermediate 
curves were generated. 

In summary, then, the results are not overly sensitive to noise, power constraints, stimulus 
probabilities, or the specific way in which tuning curves are defined mathematically. 

An Alternative Set of Constraints 

A different approach was also explored in which Equation [6] was directly minimized with re- 
spect to w and (r) using a modified gradient descent algorithm. In this case, both w and (r) were 
updated iteratively. Tuning curves were constrained to vary between and 40, which was enforced 
by renormalizing all modified curves after every update. In addition, the synaptic connections 
were forced to be sparse by adding to Equation[6]a penalty term proportional to 

N n n N 

^2^2 \ W aiW aj \ +^ \ W aiWpi\- (15) 

Here, the first term represents the synaptic redundancy across rows of w, whereas the second term 
represents the redundancy across columns. For instance, the penalty term for row a is Y^ij^i \ w ai w aj I ■ 
This expression is minimized when only one of the connections to downstream neuron a is not zero, 
in which case the actual value of the non-zero weight does not matter. This type of penalty thus 
tends to produce many synaptic weights equal to zero. When the number of basis neurons is equal 
to the number of downstream neurons, this constraint assigns one basis neuron to one downstream 
neuron (if no other restrictions are imposed), making the connectivity matrix equivalent to the unit 
matrix. Thus, an intuitive interpretation of this constraint is "construct each downstream response 
using as few basis neurons as possible". 

The results that follow were obtained by minimizing El (Equation^) plus the expression above 
multiplied by a constant that determined the penalty strength. To use as few restrictions as possible, 
the motor responses F and the sensory tuning curves were made to vary between and 40; no other 
normalization conditions were invoked. 

Figure S2 illustrates the effect of the sparse connectivity constraint. Here, F consisted of increas- 
ing and decreasing sigmoidal functions, as in Figure 2C. Without the sparseness constraint, the 
distribution of optimal connections that results is approximately normal (Figure S2E), and the cor- 
responding optimal basis functions have multiple peaks and no particular structure (Figure S2B). 
This is because, as mentioned in the main text, without additional constraints on w or (r) the mini- 
mization of El is an ill-posed problem; there is no unique solution. In contrast, with the sparseness 
constraint in place, a large fraction of the synaptic connections become zero (Figure S2F), and the 
optimal basis responses are uniformly spaced and almost perfectly monotonic (Figure S2C). These 
effects increase with the strength of the penalty term (Figure S2D and S2G). 

To quantify the monotonicity of the resulting basis responses, the derivative of each tuning 
curve was computed numerically, and a monotonicity index for each curve was calculated from it. 
Defining dik = (r ue+i) — (i~ik)> the monotonicity index for cell i is 

M/, = ^i. (16) 

This number goes from —1 for a monotonically decreasing curve to +1 for a monotonically increas- 
ing curve, with values near indicating about equal numbers of increasing and decreasing steps. 
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Figure S2. Optimal tuning with sparse con- 
nectivity. Results are shown for F with 
2000 downstream functions and sets of 10 
basis functions. (A) Four examples of de- 
sired downstream responses. (B) Four ex- 
amples of optimal tuning curves obtained 
with no connectivity constraint (penalty 
strength was zero). (C, D) Optimal tuning 
curves obtained with penalty strengths of 
0.002 and 0.02, respectively. (E-G) Distribu- 
tions of synaptic weight values for penalty 
strengths of 0, 0.002 and 0.02, respectively. 
(H— J) Distributions of monotonicity indices 
for penalty strengths of 0, 0.002 and 0.02, re- 
spectively. 

Figure S2H-J shows how the distribution of monotonicity indices changes as the strength of the 
sparseness constraint is increased. To construct the histograms, optimal sets of ten tuning curves 
were obtained six times with different initial conditions, producing slightly different sets. The 60 
indices were then pooled to produce the shown distributions. The values are clustered near when 
the basis tuning curves oscillate and reach ±1 when they are monotonic. 

The results in Figure S2 show that the sparse-connectivity constraint is effective at disambiguat- 
ing the shapes of the optimal tuning curves and may lead to monotonic profiles. Importantly, 
however, monotonic curves are produced only when the target downstream functions are them- 
selves monotonic or have monotonic components. This is shown in Figure S3. When the desired 
downstream responses are oscillatory and lack any directional bias, the resulting basis functions 
are themselves oscillatory, and their monotonicity indices are clustered around zero (Figure. S3 A 
and S3B). Most interestingly, for the downstream functions used in Figure 3, which model hypo- 
thetical reactions to binocular disparity signals, the result is again an intermediate representation 
where the optimal tuning curves have various degrees of monotonicity and thus a widely-spread 
distribution of monotonicity indices (Figure S3E). 

Influence of Stimulus Statistics 

One particular question that was thoroughly investigated using this alternative set of con- 
straints was whether the statistics of the stimuli could affect the tendency of the basis neurons 
to develop monotonic responses. The left-most column in Figure S4 shows four probability profiles 
that were tested. The flat one at the top is the standard, where all stimuli are equally probable 
(sfc = 1/M). In the other three, some stimuli are a lot more frequent than others. Sets of ten optimal 
basis responses were obtained with each profile when the goal was to approximate either oscilla- 
tory or monotonic downstream functions (Figure S4A and S4B). Figure S4 shows results obtained 
using a high level of noise, which tended to enhance the effects of the probabilities, as observed 
earlier with the parametric approach (Figure S2C and S2D). As can be seen by comparing the re- 
sulting tuning curves in different rows, the basis responses deemed optimal did adapt according to 
the probabilities. In particular, with non-monotonic downstream responses the variations in tuning 
curve amplitude were a lot smaller in the regions where stimuli had low probabilities. And with 
monotonic downstream responses it was the spread of the tuning curves that varied appreciably. 
However, in both cases the degree of monotonicity was barely affected by the stimulus probabili- 
ties. The two columns under Figure S4A show that when the target functions are non-monotonic 
the optimal tuning curves should also be non-monotonic, regardless of the probabilities s&. Simi- 
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Figure S3. Optimal tuning curves for five 
classes of motor functions. Results are as 
in Figure S2, with a penalty strength of 
0.002. Each row shows results for one 
class of downstream functions. Each panel 
in the left column shows four representa- 
tive examples of functions of a given type; 
the middle column shows the resulting op- 
timal basis responses; the right column 
shows their distributions of monotonicity 
indices. (A) Low-frequency oscillating func- 
tions. For clarity, only four of ten tuning 
curves are shown. (B) As in (A), but for 
high-frequency oscillating functions. (C) In- 
creasing and decreasing sigmoidal curves 
(as in Figure S2C). (D) Exponential func- 
tions. (E) Motor functions that oscillate but 
have an increasing or decreasing trend (as 
in Figure 3). 

larly, the two columns under Figure S4B indicate that when the target functions are monotonic the 
optimal tuning curves should also be predominantly increasing or decreasing, and the probabilities 
Sfe barely make a difference in this regard. Therefore, it seems that monotonic tuning curves cannot 
be generated simply by manipulating the stimulus statistics. And furthermore, if monotonic curves 
are optimal because of downstream requirements, it seems that the statistics of the stimuli cannot 
override this trend. 

Match between Analytic and Numerical Results 

When a specific correlation matrix <I> is chosen, Equations [3] and [14] make a prediction about 
how the mean square error should decrease as a function of the number of tuning curves (basis 
functions) used for approximating the corresponding downstream responses. Verifying that the 
numerical results match these expressions is important, first, to check that the minimization rou- 
tines used to find the optimal basis functions are working properly, and second, to investigate 
whether few basis functions may indeed be sufficient to approximate accurately a large number of 
downstream responses, as implied by the analysis. 

To address this, a special set of downstream responses was generated which, by construction, 
could be approximated exactly by the family of parameterized curves — single-peak or monotonic 
— used in Figure 2. This was done in three steps. (1) An arbitrary set of six peaked and monotonic 
curves was generated using the regular parameterization of Equation^ these six curves are shown 
in Figure S5A. (2) A set of control downstream functions were constructed by adding the six curves 
in random proportions. Thus, each downstream response was equal to a linear combination of the 
six curves with coefficients drawn randomly from [—1,1]. Four examples of such downstream 
functions are shown in Figure S5B, along with the resulting €> correlation matrix. (3) This control 
€> matrix was input to the minimization routine, which searched for the sets of 1, 2,. . ., 6 optimal 
tuning curves that minimized Eb- Note that the minimization routine had no information about 
the downstream functions other than The set of six optimal tuning curves (n = 6) found by the 
routine is shown in Figure S5C. They are almost identical to the original set used to construct the 
control 3?. The mean square error as a function of the number of optimal tuning curves used in the 
approximation (n) is shown in Figure S5D. The actual error values (red circles) are superimposed on 
the minimum values (black dots) expected from Equation [3] The numbers are very close, showing 
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Figure S4. Effect of stimulus statistics. Sets of ten optimal basis responses were obtained for oscillatory (A) 
and monotonic (B) downstream functions using 4 stimulus probability profiles (left-most column). A profile 
corresponds to the probability s k as a function of stimulus k. Each condition was repeated 6 times to obtain 
60 monotonicity indices. Optimal sensory responses and histograms of monotonicity indices are shown for 
each condition, as indicated in the respective columns. (C) Uniform profile. All stimuli were equally frequent. 
(D) Linear profile. Stimulus probability increased steadily. (E) Step profile. Stimulus probability changed 
abruptly at stimulus 25. (F) Gaussian profile. Stimuli in the middle of the range were much more probable 
than at the edges. All results are as in Figure S3 (penalty strength of 0.002), except that high noise was used 
(q = 0.5). Monotonicity indices were highly insensitive to stimulus probabilities. 

that the minimization routine indeed found the best possible basis functions. They are not exactly 
equal, particularly for the first three points, because the shapes of, say, the best two basis functions 
(n = 2) cannot be perfectly fit by the parameterization of Equation^ The error, however, is virtually 
zero for n = 6, where by construction the parameterized curves can indeed reproduce the best basis 
functions exactly. 

What happens with other correlation matrices 3?? An example is shown in Figure S5E, which 
plots the mean square error (red circles) obtained in approximating the sigmoidal motor responses 
of Figure 2C with the sets of 2, 4 and 8 tuning curves shown in Figure 2G. Black dots are again 
the minimum values obtained from Equation [3) except that now the eigenvalues of the 3? matrix 
in Figure 2C were used. Note that the errors are generally small; just two basis functions (n = 2) 
capture more than 85% of the variance of the downstream responses. This is because in this case 
all the motor responses are rather similar to each other. In contrast, Figure S5F shows an analogous 
plot that was generated using the <I> matrix in Figure 2A, which resulted from oscillatory motor 
responses. In this case, both the minimum expected error (black dots) and the actual approximation 
error obtained with 2, 4 or 8 tuning curves (red circles) start much higher. The difference between 
them, however, is again small. 

In Figure S5E and S5F the red and black dots differ for two reasons, first, because the noise 
was not zero, and any noise increases the approximation error (Equation [1~4|| , and second, because 
the parameterized curves could not match exactly the shapes of the optimal tuning curves. In 
all the examples studied, the error due to this mismatch in shape was equivalent to about one 
tuning curve. Thus, the error obtained with eight tuning curves was approximately equal to the 
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Figure S5. Mean squared error in the approximation versus theoretical minimum. (A) Set of functions used 
to construct the downstream responses shown in (B), which served as controls. Each of 5000 downstream 
curves was generated as a linear combination of the six responses shown in (A), with random coefficients. The 
resulting correlation matrix $ is in the inset. (C) Set of six optimal tuning curves recovered by the minimiza- 
tion routine. The curves shown minimized Eb given the control matrix <& in panel (B) and zero noise. (D) 
Mean squared error Eb as a function of the number of optimal tuning curves, n, when the <& matrix in (B) was 
used. Black dots are the minimum errors calculated from Equation[3] Red circles are the E B values obtained 
(from Equation [8) with the indicated number of basis tuning curves. (E) As in (D), but when the <I> matrix 
for sigmoidal functions shown in Figure 2C was used. Red circles correspond to the sets of 2, 4 and 8 curves 
shown in Figure 2G. (F) As in (E), but when the 3? matrix for oscillatory functions shown in Figure 2A was 
used. Red circles correspond to the sets of 2, 4 and 8 curves shown in Figure 2E. (G) Optimal tuning curves for 
the oscillatory functions shown in Figure 2A were computed using the alternative method with no restriction 
on the synaptic weights (sparseness not enforced). Red circles are for optimal tuning curves only constrained 
to have a maximum of 1. Blue triangles are for curves constrained to have a maximum of 1 and a minimum of 

0. Black dots are the same as in (F). No noise was used in (D) and (G). 

minimum error calculated with seven eigenvalues, and so on. This is best illustrated in Figure S5G, 
which was constructed as follows. The black dots are the minimum errors for approximating the 
oscillatory motor responses of Figure 2A; they are the same as in Figure S5F. The other data points 
are the errors obtained with n tuning curves found using the alternative method mentioned in the 
previous section. That is, gradient descent was used to find sets of n tuning curves and connection 
weights that minimized E L . Here, no noise and no restrictions whatsoever were placed on the 
synaptic weights. The red circles were obtained when the only constraint on the tuning curves 
was that their maximum had to be equal to 1. This did not limit the range of tuning curve values 
(which could be negative), nor the shape of the profiles, so the approximation errors were virtually 
equal to the theoretical minima. On the other hand, the blue triangles were obtained with the same 
method, but when the tuning curves were constrained to have a minimum of and a maximum of 

1 . This limited their range of values and caused a small additional error. However, with more than 
4 or 5 tuning curves, the approximation error with n tuning curves was almost exactly equal to the 
minimum (theoretical) error with n — 1 curves. 

In conclusion, the families of downstream responses studied here can indeed be described by a 
few eigenvalues and eigenvectors. As a consequence, few tuning curves may be enough to approx- 
imate a large variety of motor responses even when the range of tuning curve shapes is limited. 

Note on the Number of Motor Responses 

The <I? matrices shown here were produced by averaging 5000 downstream functions. Such 
large number was used so that, for a given class of functions, the corresponding <fr matrices would 
change little from run to run. With this number, two matrices generated with different groups of 
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5000 functions from the same class were virtually indistinguishable, and so were their eigenvalues. 
This was convenient for eliminating a potential source of variability across runs, and this in turn 
was important to ensure that the results were repeatable and that the solution was not a local mini- 
mum of Eg. However, adequate (smooth) $ matrices could also be obtained with 200 downstream 
functions or less, depending on their type. 

The condition that there should be more motor neurons than sensory neurons (n < N) is crucial 
for the present framework. It is a reasonable assumption if one considers that a given sensory stim- 
ulus, say, a phone ringing, may trigger a variety of motor actions, such as answering the phone, 
ignoring it, listening to the answering machine to find out what the caller wants, etc. So, com- 
bined with the different contexts in which a stimulus may appear, the number of associated motor 
responses may be very large. 
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